Querying Web-Accessible Life Science Sources: Which paths to choose?
نویسندگان
چکیده
Web-accessible life sciences sources are characterized by a complex graph of overlapping sources, and multiple alternate links between sources. A (navigational) query may be answered by traversing multiple alternate paths between a start source and a target source. Each of these paths may have dissimilar benefit, e.g., the cardinality of result objects that are reached in the target source. Paths may also have dissimilar costs of evaluation, i.e., the execution cost of a query evaluation plan for a path. Finally, since the result objects of alternate paths may overlap, the combined benefit of two paths are not independent. In this context, we present two problems. The first problem is to determine the K-best paths or Relevant Paths with low cost and high benefit. The second problem is to choose a good combination of top-k (possibly overlapping) paths. While the first problem regards paths individually and finds the best ones among a vast number of paths, the second problem assesses the integrated result of a set of paths. Further, we discuss the interrelation between the two problems and motivate the importance of a practical solution.
منابع مشابه
Querying XML Sources Using an Ontology-based Mediator
In this paper we propose a mediator architecture for the querying and integration of Web-accessible XML data sources. Our contributions are (i) the definition of a simple but expressive mapping language, following the local as view approach and describing XML resources as local views of some global schema, and (ii) efficient algorithms for rewriting user queries according to existing source des...
متن کاملEfficient Techniques to Explore and Rank Paths in Life Science Data Sources
Life science data sources represent a complex link-driven federation of publicly available Web accessible sources. A fundamental need for scientists today is the ability to completely explore all relationships between scientific classes, e.g., genes and citations, that may be retrieved from various data sources. A challenge to such exploration is that each path between data sources potentially ...
متن کاملQuerying Heterogeneous Mediated Sources: A Survey
Data integration systems allow access to information in increasingly different forms: relational databases, spreadsheets, web pages, and so on. Querying such heterogeneous sources is challenging due to non-uniform query capability of sources, variety of schema and data models, and limitations on access paths. Most systems use some form of mediation to allow access to heterogeneous sources. Some...
متن کاملA User-Centric Framework for Accessing Biological Sources and Tools
Biologists face two problems in interpreting their experiments: the integration of their data with information from multiple heterogeneous sources and data analysis with bioinformatics tools. It is difficult for scientists to choose between the numerous sources and tools without assistance. Following a thorough analysis of scientists’ needs during the querying process, we found that biologists ...
متن کاملEfficiently Querying Moving Objects with Pre-defined Paths in a Distributed Environment
Due to the recent growth of the World Wide Web, numerous spatio-temporal applications can obtain their required information from publicly available web sources. We consider those sources maintaining moving objects with predefined paths and schedules, and investigate different plans to perform queries on the integration of these data sources efficiently. Examples of such data sources are network...
متن کامل